distributional model
Distributional Diffusion Models with Scoring Rules
De Bortoli, Valentin, Galashov, Alexandre, Guntupalli, J. Swaroop, Zhou, Guangyao, Murphy, Kevin, Gretton, Arthur, Doucet, Arnaud
Diffusion models generate high-quality synthetic data. They operate by defining a continuous-time forward process which gradually adds Gaussian noise to data until fully corrupted. The corresponding reverse process progressively "denoises" a Gaussian sample into a sample from the data distribution. However, generating high-quality outputs requires many discretization steps to obtain a faithful approximation of the reverse process. This is expensive and has motivated the development of many acceleration methods. We propose to accomplish sample generation by learning the posterior {\em distribution} of clean data samples given their noisy versions, instead of only the mean of this distribution. This allows us to sample from the probability transitions of the reverse process on a coarse time scale, significantly accelerating inference with minimal degradation of the quality of the output. This is accomplished by replacing the standard regression loss used to estimate conditional means with a scoring rule. We validate our method on image and robot trajectory generation, where we consistently outperform standard diffusion models at few discretization steps.
Distributional Semantics, Holism, and the Instability of Meaning
Grindrod, Jumbly, Porter, J. D., Hansen, Nat
Current language models are built on the so-called distributional semantic approach to linguistic meaning that has the distributional hypothesis at its core. The distributional hypothesis involves a holistic conception of word meaning: the meaning of a word depends upon its relations to other words in the model. A standard objection to meaning holism is the charge of instability: any change in the meaning properties of a linguistic system (a human speaker, for example) would lead to many changes or possibly a complete change in the entire system. When the systems in question are trying to communicate with each other, it has been argued that instability of this kind makes communication impossible (Fodor and Lepore 1992, 1996, 1999). In this article, we examine whether the instability objection poses a problem for distributional models of meaning. First, we distinguish between distinct forms of instability that these models could exhibit, and we argue that only one such form is relevant for understanding the relation between instability and communication: what we call differential instability. Differential instability is variation in the relative distances between points in a space, rather than variation in the absolute position of those points. We distinguish differential and absolute instability by constructing two of our own models, a toy model constructed from the text of two novels, and a more sophisticated model constructed using the Word2vec algorithm from a combination of Wikipedia and SEP articles. We demonstrate the two forms of instability by showing how these models change as the corpora they are constructed from increase in size.
Montague semantics and modifier consistency measurement in neural language models
Carvalho, Danilo S., Manino, Edoardo, Rozanova, Julia, Cordeiro, Lucas, Freitas, André
In recent years, distributional language representation models have demonstrated great practical success. At the same time, the need for interpretability has elicited questions on their intrinsic properties and capabilities. Crucially, distributional models are often inconsistent when dealing with compositional phenomena in natural language, which has significant implications for their safety and fairness. Despite this, most current research on compositionality is directed towards improving their performance on similarity tasks only. This work takes a different approach, and proposes a methodology for measuring compositional behavior in contemporary language models. Specifically, we focus on adjectival modifier phenomena in adjective-noun phrases. We introduce three novel tests of compositional behavior inspired by Montague semantics. Our experimental results indicate that current neural language models behave according to the expected linguistic theories to a limited extent only. This raises the question of whether these language models are not able to capture the semantic properties we evaluated, or whether linguistic theories from Montagovian tradition would not match the expected capabilities of distributional models.
NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for Lexical Semantic Change in Diachronic Italian Corpora
Angel, Jason, Rodriguez-Diaz, Carlos A., Gelbukh, Alexander, Jimenez, Sergio
We present our systems and findings on unsupervised lexical semantic change for the Italian language in the DIACR-Ita shared-task at EVALITA 2020. The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets. We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes. Our first model solely relies on part-of-speech usage and an ensemble of distance measures. The second model uses word embedding representation to extract the neighbor's relative distances across spaces and propose "the average of absolute differences" to estimate lexical semantic change. Our models achieved competent results, ranking third in the DIACR-Ita competition. Furthermore, we experiment with the k_neighbor parameter of our second model to compare the impact of using "the average of absolute differences" versus the cosine distance used in Hamilton et al. (2016).
Semantic Relatedness and Taxonomic Word Embeddings
Kacmajor, Magdalena, Kelleher, John D., Klubicka, Filip, Maldonado, Alfredo
This paper 1 connects a series of papers dealing with taxonomic word embeddings. It begins by noting that there are different types of semantic relatedness and that different lexical representations encode different forms of relatedness. A particularly important distinction within semantic relatedness is that of thematic versus taxonomic relatedness. Next, we present a number of experiments that analyse taxonomic embeddings that have been trained on a synthetic corpus that has been generated via a random walk over a taxonomy. These experiments demonstrate how the properties of the synthetic corpus, such as the percentage of rare words, are affected by the shape of the knowledge graph the corpus is generated from. Finally, we explore the interactions between the relative sizes of natural and synthetic corpora on the performance of embeddings when taxonomic and thematic embeddings are combined.
A Typedriven Vector Semantics for Ellipsis with Anaphora using Lambek Calculus with Limited Contraction
Wijnholds, Gijs, Sadrzadeh, Mehrnoosh
We develop a vector space semantics for verb phrase ellipsis with anaphora using type-driven compositional distributional semantics based on the Lambek calculus with limited contraction (LCC) of J\"ager (2006). Distributional semantics has a lot to say about the statistical collocation-based meanings of content words, but provides little guidance on how to treat function words. Formal semantics on the other hand, has powerful mechanisms for dealing with relative pronouns, coordinators, and the like. Type-driven compositional distributional semantics brings these two models together. We review previous compositional distributional models of relative pronouns, coordination and a restricted account of ellipsis in the DisCoCat framework of Coecke et al. (2010, 2013). We show how DisCoCat cannot deal with general forms of ellipsis, which rely on copying of information, and develop a novel way of connecting typelogical grammar to distributional semantics by assigning vector interpretable lambda terms to derivations of LCC in the style of Muskens & Sadrzadeh (2016). What follows is an account of (verb phrase) ellipsis in which word meanings can be copied: the meaning of a sentence is now a program with non-linear access to individual word embeddings. We present the theoretical setting, work out examples, and demonstrate our results on a toy distributional model motivated by data.
A Generalised Quantifier Theory of Natural Language in Categorical Compositional Distributional Semantics with Bialgebras
Hedges, Jules, Sadrzadeh, Mehrnoosh
Categorical compositional distributional semantics is a model of natural language; it combines the statistical vector space models of words with the compositional models of grammar. We formalise in this model the generalised quantifier theory of natural language, due to Barwise and Cooper. The underlying setting is a compact closed category with bialgebras. We start from a generative grammar formalisation and develop an abstract categorical compositional semantics for it, then instantiate the abstract setting to sets and relations and to finite dimensional vector spaces and linear maps. We prove the equivalence of the relational instantiation to the truth theoretic semantics of generalised quantifiers. The vector space instantiation formalises the statistical usages of words and enables us to, for the first time, reason about quantified phrases and sentences compositionally in distributional semantics.
Commonsense Reasoning Based on Betweenness and Direction in Distributional Models
Schockaert, Steven (Cardiff University) | Derrac, Joaquín (Cardiff University)
Several recent approaches use distributional similarity for making symbolic reasoning more flexible. While an important step in the right direction, the use of similarity has a number of inherent limitations. We argue that similarity-based reasoning should be complemented with commonsense reasoning patterns such as interpolation and a fortiori inference. We show how the required background knowledge for these inference patterns can be obtained from distributional models.
Reasoning about Meaning in Natural Language with Compact Closed Categories and Frobenius Algebras
Kartsaklis, Dimitri, Sadrzadeh, Mehrnoosh, Pulman, Stephen, Coecke, Bob
Compact closed categories have found applications in modeling quantum information protocols by Abramsky-Coecke. They also provide semantics for Lambek's pregroup algebras, applied to formalizing the grammatical structure of natural language, and are implicit in a distributional model of word meaning based on vector spaces. Specifically, in previous work Coecke-Clark-Sadrzadeh used the product category of pregroups with vector spaces and provided a distributional model of meaning for sentences. We recast this theory in terms of strongly monoidal functors and advance it via Frobenius algebras over vector spaces. The former are used to formalize topological quantum field theories by Atiyah and Baez-Dolan, and the latter are used to model classical data in quantum protocols by Coecke-Pavlovic-Vicary. The Frobenius algebras enable us to work in a single space in which meanings of words, phrases, and sentences of any structure live. Hence we can compare meanings of different language constructs and enhance the applicability of the theory. We report on experimental results on a number of language tasks and verify the theoretical predictions.
Compositional Operators in Distributional Semantics
The recent developments on the syntactical and morphological analysis of natural language text constitute the first step towards a more ambitious goal, that of assigning a proper form of meaning to arbitrary text compounds. Indeed, for certain really "intelligent" applications, such as machine translation, question-answering systems, paraphrase detection, or automatic essay scoring, to name just a few, there will always exist a gap between raw linguistic information (such as part-of-speech labels, for example) and the knowledge of the real world that is needed for the completion of the task in a satisfactory way. Semantic analysis has exactly this role, aiming to close (or reduce as much as possible) this gap by linking the linguistic information with semantic representations that embody this elusive real-world knowledge. The traditional way of adding semantics to sentences is a syntax-driven compositional approach: every word in the sentence is associated with a primitive symbol or a predicate, and these are combined to larger and larger logical forms based on the syntactical rules of the grammar. At the end of the syntactical analysis, the logical representation of the whole sentence is a complex formula that can be fed to a theorem prover for further processing. Although such an approach seems intuitive, it has been shown that it is rather inefficient for any practical application (for example, Bos and Markert (2006) get very low recall scores for a textual entailment task).